Assisting bug Triage in Large Open Source Projects Using Approximate String Matching

نویسنده

  • Amir H. Moin
چکیده

In this paper, we propose a novel approach for assisting human bug triagers in large open source software projects by semi-automating the bug assignment process. Our approach employs a simple and efficient n-gram-based algorithm for approximate string matching on the character level. We propose and implement a recommender prototype which collects the natural language textual information available in the summary and description fields of the previously resolved bug reports and classifies that information in a number of separate inverted lists with respect to the resolver of each issue. These inverted lists are considered as vocabulary-based expertise and interest models of the developers. Given a new bug report, the recommender creates all possible n-grams of the strings, evaluates their similarities to the available expertise models concerning a number of well-known string similarity measures, namely Cosine, Dice, Jaccard and Overlap coefficients. Finally, the top three developers are recommended as proper candidates for resolving this new issue. Experimental results on 5200 bug reports of the Eclipse JDT project show weighted average precision value of 90.1% and weighted average recall value of 45.5%. Keywords-software deployment and maintenance; semiautomated bug triage; approximate string retrieval; open source software.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic bug triage using text categorization

Bug triage, deciding what to do with an incoming bug report, is taking up increasing amount of developer resources in large open-source projects. In this paper, we propose to apply machine learning techniques to assist in bug triage by using text categorization to predict the developer that should work on the bug based on the bug’s description. We demonstrate our approach on a collection of 15,...

متن کامل

A Bug Triage and Localization Technique based on Bug Reports Classification

With a great number of software products that have been developed, bug fixing is difficult due to a large number of submitted bug reports each day. Sometimes developers usually describe the same errors in the different bug reports, these bug reports are called duplicate bug reports, the increasing number of duplicates lead to a large amount of time and effort for identifying and analyzing bug r...

متن کامل

Learning from evolving data streams: online triage of bug reports

Open issue trackers are a type of social media that has received relatively little attention from the text-mining community. We investigate the problems inherent in learning to triage bug reports from time-varying data. We demonstrate that concept drift is an important consideration. We show the effectiveness of online learning algorithms by evaluating them on several bug report datasets collec...

متن کامل

Improved Approach for Predicting the Bug Triage Using Data Reduction Methods

Most of the software companies need to deal with vast number of software bugs day to day. This paper can be viewed as an application of instance selection and feature selection in bug repositories. The aim is to address the problem of data reduction for bug triage, and to reduce the scale and improve the quality of bug data. This can be achieved by combining instance selection with feature sele...

متن کامل

An Overview of the Software Engineering Process and Tools in the Mozilla Project

The Mozilla Project is an Open Source Software project which is dedicated to development of the Mozilla Web browser and application framework. Possessing one of the largest and most complex communities of developers among Open Source projects, it presents interesting requirements for a software process and the tools to support it. Over the past four years, process and tools have been refined to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012